Goto

Collaborating Authors

 posterior and computational uncertainty


Posterior and Computational Uncertainty in Gaussian Processes

Neural Information Processing Systems

Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.


Supplementary Material: Posterior and Computational Uncertainty in Gaussian Processes Jonathan Wenger

Neural Information Processing Systems

This supplementary material contains additional results and in particular proofs for all theoretical statements. Then Algorithm 1 recovers the pivoted Cholesky decomposition, i.e. it holds for all It holds by assumption and eq. S1.3 Conjugate Gradient MethodAlgorithm S3: Preconditioned Conjugate Gradient Method [38] Input: kernel matrix ˆ K, labels y, prior mean µ, preconditioner ˆ P Output: representer weights v We prove the claim by induction. By the form of preconditioned deflated CG given in Algorithm 3.6 of Saad et al. This proves the first claim. We prove the claims by induction.


Posterior and Computational Uncertainty in Gaussian Processes

Neural Information Processing Systems

Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points.